Facial emion recognition using deep learning

Maria Hanna (900181559)
Chris Amgad (900170819)

Problem Statement

The goal of this project is to make use of a model-based approach to recognize a picture of a human’s facial emotion as different deep learning architectures allow the automatic extraction of features and classification.

Datasets

FER2013: It is publicly available online and it has 32298 pictures with dimensions of 48x48. This dataset contains a grayscale, labeled set of images with a consistent format. Facial emotions are highlighted by centering it within each frame. The images are all of uniform dimension.

KDEF: There are 4900 pictures of 562x762 pixels. The pictures included in this dataset includes only 70 individuals between 20 and 30 years, where each individual is displaying 7 different expressions, and each expression is taken from 5 different angles.

Input/Output Examples

For FER2013 datasets

For KDEF datasets

State of the art

The model that has demonstrated the greatest accuracy in facial emotion image recognition is the Deep-Emotion model using the FER2013, CK+ and FERG databases for 7 classes: Angry, Disgust, Fear, Happy, Sad, Surprised and Neutral. Below is Emotion recognition classification accuracies for the FER2013, CK+ and FERG datasets of the proposed model in comparison to other models without having any extra training data, which showed that it has competed, and outperformed other expression recognition models .

Orignial Model from Literature

The Facial recognition system proposed uses the pre-trained deep CNN model VGG-16 with KDEF and JAFFE datasets to classify 1000 image objects. The first CNN layer captures some simple features such as the edges and corners of the image, which is followed by another CNN layer that detects more complex features such as shapes, whereas the upper layer follows the same method to learn more complex features. The pre-trained model is modified for emotion recognition, redefining the dense layers, and then fine-tuning is performed with emotion data. The last dense layers of the 14 pre-trained model are replaced with the new dense layers to recognize a facial image into one of 7 emotion classes (afraid, angry, disgusted, sad, happy, surprised, and neutral). The fine-tuning is performed on the architecture having the convolution base of the pre-trained model plus the added dense layers. Data Preprocessing includes resizing, cropping and more tasks that are used to train in fine-tuning.

Proposed Updates

FER2013 Dataset

CNN only
Pre-trained model with Data Augmentation (2 experiments)
Pretrained model with dropout and longer training

KDEF Dataset

VGG16 pretrained model with Data Augmentation on KDEF
VGG16 pretrained model with Data Augmentation and dropout on KDEF
VGG16 pretrained model with fine-tuning layers
VGG16 pretrained model with fine-tuning layers(with freezing all layers)
InceptionV3 pretrained model with fine-tuning layers
ResNet pretrained model with fine-tuning layers on KDEF
VGG16 pretrained model with fine-tuning layers and changing optmizer on KDEF
Average Ensemble
Weighted Average Ensemble

Update #1: CNN only

2 convolution blocks (convolution layer + Pooling layer), followed by a convolution layer and added a fully connected layer to classify 7 classes

Update #2: Pre-trained model with Data Augmentation (2 experiments)

Update #3: Pretrained model with dropout and longer training

We used 50 epochs instead of 30 epochs and added a dropout layer before the last layer

Update #4: VGG16 pretrained model with Data Augmentation on KDEF

Same experiment as update 2, but we did it on the KDEF dataset instead of the FER2013 dataset

Update #5: VGG16 pretrained model with Data Augmentation and dropout on KDEF

Added a dropout layer to the previous experiment reduce

Update #6: VGG16 pretrained model with fine-tuning layers

Freezing first 3 layers and unfreezing the 4th & 5th layers

Update #7: VGG16 pretrained model with fine-tuning layers second experiment

Train the Full VGG16 Network by unfreezeing all the layers and train the full model

Update #8: InceptionV3 pretrained model with fine-tuning layers

UnFreezing starting from the conv2d_260 CNN layer until the end of the pretrained model

Update #9: ResNet pretrained model with fine-tuning layers on KDEF

Froze unit until res5a_branch2a layer then unfroze

Update #10: VGG16 pretrained model with fine-tuning layers on KDEF

Froze first 3 layers and unfroze last 2 layers with augmentation and changed Optimizer to SGD with learning rate of 5*10^-3, Momentum = 0.8, decay=0.0005

Update #11: Ensemble learning(Average ensemble)

Average of 3 models:

Original Model of Paper (Pre-trained VGG16 model)
Pre-trained VGG16 model with Augmentation with unfreezing last 2 layers
Pre-trained VGG16 model with dropout layer

Update #12: Ensemble learning(weighted average ensemble)

Weighted Average of 3 models:

Original Model of Paper (Pre-trained VGG16 model) --> Weight=0.5
Pre-trained VGG16 model with Augmentation with unfreezing last 2 layers --> Weight=0.3
Pre-trained VGG16 model with dropout layer --> Weight=0.2

Results

Update results #1: CNN only

Categorical Training accuracy is 96.4%, while Categorical Validation accuracy is 55.9%, caused high overfitting

Update results #2: Pre-trained model with Data Augmentation (2 experiments)

Update results #3: Pretrained model with dropout and longer training

Greater categorical training accuracy of 97.1% and Validation accuracy of 59.3% and it also caused overfitting

Update results #4: VGG16 pretrained model with Data Augmentation on KDEF

Higher Accuracy than one in paper

Update results #5: VGG16 pretrained model with Data Augmentation and dropout on KDEF

Validation accuracy decreased

Update results #6: VGG16 pretrained model with fine-tuning layers

Accuracy: 97.53%

Update results#7: VGG16 pretrained model with fine-tuning layers second experiment

Accuracy: 0.1405791019723038%

Update results#8: InceptionV3 pretrained model with fine-tuning layers

Accuracy: 99.56%

Update results #9: ResNet pretrained model with fine-tuning layers on KDEF

65.37% Training accuracy

Update results #10: VGG16 pretrained model with fine-tuning layers on KDEF

Validation Accuracy of 87.47%

Update results #11: Ensemble learning(Average ensemble)

Validation Accuracy 54.28%

Update results #12: Ensemble learning(weighted average ensemble)

Validation Accuracy 97.2%

Technical report

Here you will detail the details related to training, for example:

python/tensorflow/keras
google colab and GPU from alienware in thesis lab
Training time: approx. 100 mins/run
Number of epochs: 50-150
Time per epoch: 30-50 secs

Conclusion

-Performing Image Augmentation has improved the validation accuracy but in expense lowered slightly the training accuracy

-Overfitting occurs when training over the FER2013 data even when performing regularization techniques, but training over the KDEF dataset performed better and produced higher validation accuracy with very minor overfitting.

-Pre-trained VGG16 and pre-trained InceptionV3 with fine-tuning layers produced good performances on both validation and training data, but ResNet performed poorly and underfits on the training data.

-For future work we can deploy the model real-time and further fine tune the parameters. We can also experiment with more datasets and find a way to handle different obstacles with illumination and so on.

References

List all references here, the following are only examples